Pick one CUDA primitive at a time, then map the same hardware floorplan to CNN and Transformer workloads.
—
Pick a call from the list on the left.
—
Switch tabs to map the same hardware onto each workload. Time runs top to bottom, and each step shows what that unit is doing on its own.